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Abstract 

We present a RELM forecast of future earthquakes in California that is 
primarily based on the pattern informatics (PI) method. This method 
identifies regions that have systematic fluctuations in seismicity, and it 
has been demonstrated to be successful. A PI forecast map originally 
published on 19 February 2002 for southern California successfully fore- 
cast the locations of sixteen of eighteen M>5 earthquakes during the 
past three years. The method has also been successfully applied to Japan 
and on a worldwide basis. An alternative approach to earthquake fore- 
casting is the relative intensity (RI) method. The RI forecast map is 
based on recent levels of seismic activity of small earthquakes. Recent 
advances in the PI method show considerable improvement, particularly 
when compared with the RI method using relative operating character- 
istic (ROC) diagrams for binary forecasts. The RELM application re- 
quires a probability for each location for a number of magnitude bins 
over a five year period. We have therefore constructed a hybrid fore- 
cast in which we combine the PI method with the RI method to com- 
pute a map of probabilities for events occurring at any location, rather 
than just the most probable locations. These probabilities are further 
converted, using Gutenberg-Richter scaling laws, to anticipated rates of 
future earthquakes that can be evaluated using the RELM test. 

1 Introduction 

There have been a wide variety of approaches app lied to the forecast- 
ing of earthquakes i Turcotte, 1991 ; Kanamori, 2003). These approaches 
can be divided into two general classes; the first is based on empirical 
observations of precursory changes. Examples include precursory seis- 
mic activity, precursory ground motions, and many others. The second 
approach is based on statistical patterns of seismicity. Neither approach 
has been able to provide reliable short-term forecasts (days to months) 
on a consistent basis. 

Although short-term predictions are not available, long-term seismic- 
hazard assessments can be made. A large fraction of all earthquakes oc- 
cur in the vicinity of plate boundaries, although some do occur in plate 
interiors. It is also possible to assess the long-term probability of having 
an earthquake of a specified magnitude in a specified region. These as- 
sessments are primarily based on the hypothesis that future earthquakes 
will oc cur in regions where pas t, typically large, earthquakes have oc- 
curred iKossobokov et all f2000). As we will discuss, a more promising 
approach is to begin with the hypothesis that the rate of occurrence of 
small earthquakes in a region can be analyzed to assess the probability 
of occurrence of much larger earthquakes. 

The RELM forecast described i n this paper is primarily based on 
the pattern informatics (PI) method i Rundle et al. , 2002 ; TiamDO et al. , 
l2002datlRundle et all [2003). This method identifies regions of strongly 
correlated fluctuations in seismic activity. These regions are the loca- 
tions where subsequent large earthquakes have been shown to occur, 



therefore indicating a strong association with the high stress preced- 
ing the main shock. The fluctuations in seismicity rate revealed in a 
PI map have been found to be related to the preparation process for large 
earthquakes. Seismic quiescence an d seismic activation i Bowma n et all 
1998; Wvss and Habermann, 1988), which are revealed by the PI map, 
are examples of such preparation processes. The PI method identifies 
the existence of correlated regions of seismicity in observational data 
that precede the main shock by months and years. The fact that this cor- 
related region locates the aftershocks as well as main shocks leads us 
to identify this region of correlated seismicity with the region of corre- 
lated high stress i Tiampo et al., 2002b c a). Finally, our results with the 
PI map indicate that the occurrences of future significant earthquakes 
are better forecasted by a change (correlated fluctuation) in the average 
seismicity rate rather than with the high seismicity rate itself. 

The PI method does not predict earthquakes, rather it forecasts the 
regions (hotspots) where earthquakes are most likely to occur in the rel- 
atively near future (typically five to ten years). The objective is to re- 
duce the areas of earthquake risk relative to those given by long-term 
hazard assessments. The result is a map of areas in a seismogenic re- 
gion (hotspots) where earthquakes are likely to occur during a specified 
period in the future. In this paper a PI map is combined with historic 
seismicity data to produce a map of probabilities for future large events. 
These probabilities can be further converted, using Gutenberg-Richter 
scaling laws, to forecast rates of occurrence of future earthquakes in spe- 
cific magnitude ranges. This forecast can be evaluated using the RELM 
model. In the following we present details of the PI method and the pro- 
cedure for producing a composite forecast map. A discussion on binary 
forecasts and forecast verification techniques is given in the appendix. 



2 The PI method 

Our approach divides the seismogenic region to be studied into a grid 
of square boxes, or "pixels", whose size is related to the magnitude 
of the earthquakes to be forecast. The rates of seismicity in each box 
are studied to quantify anomalous behavior. The basic idea is that any 
seismicity precursors represent changes, either a local increase or de- 
crease of seismic activity, so our method identifies the locations in which 
these changes are most significant during a predefined change interval. 
The subsequent forecast interval is the five year time window during 
which the forecast is valid. The box size is selected to be consistent 
with the correlation l ength associated with accelerated seismic activity 
iBowman etailll998l) , and the minimum earthquake magnitude consid- 
ered is the lower limit of sensitivity and completeness of the network in 
the region under consideration. The PI method as applied to California 
in this paper is composed of the following steps: 

1. The seismically active region is binned into boxes of size 0.1° x 
0.1° and all events having M > 3.0 are used. These boxes are la- 
beled Xi . This is also the box size specified for the RELM forecast. 
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2. The seismicity obtained from the ANSS catalog for each day in 
each box is uniformly spread over that box and the eight imme- 
diately adjacent boxes (the Moore neighborhood i Moore, 1962)). 
The resulting smoothed intensities for each box is a time series. 

3. Only the top 10% most active boxes are considered. These are the 
boxes with the most M c > 3.0 earthquakes during the period to = 
1 January 1950 to t 2 = 1 August 2005. M c is the cutoff magnitude 
for the analysis. 

4. Each time series is normalized in time by subtracting the temporal 
mean and dividing by the temporal standard deviation. 

5. Each time series is then normalized in space for each value of time 
by subtracting the spatial mean and dividing by the spatial standard 
deviation. 

6. Two intensity maps I±(xi, t;,, ti), I 2 (xi , ti,, t 2 ) are computed by 
averaging all the time series from an initial time, t b to t\ where 
£o < tb < tit an d then from t b to ti. Here to = 1 January 1950, 
ti = 1 January 1985, and t 2 = 1 August 2005. 

7. The intensity change AI(xi,ti > ,ti,t2) = I 2 (xi, t&, t 2 ) — 
Ii(xi,tf,,ti) is computed at each location and absolute value is 
taken \AI(xi,t b ,ti,t 2 )\. 

8. The average of | AI(xi, tj,, t±, t%)\ over all values of to < tj> < 
tmax is then computed. 

9. Finally, the mean squared change < | AI(xi, t;,, ti, ti)\ > 2 is 
computed. 

Note that steps (2), (3), (7 ), and (8) have been modified from the orig- 
inal, published alg orithm iRundle et all 12003: iTiampo et all |2002dat 
iRundle et alll2003t> . 

Hotspot pixels are defined to be the regions where APi (to, ti , t 2 ) 
is larger than some threshold value in the interval [0, 1]. In these re- 
gions, Pi (to , 1 1 , t 2 ) is larger than the average value for all boxes (the 
background level). Note that since the intensities are squared in defin- 
ing probabilities the hotspots may be due to either increases of seismic 
activity during the change time interval (activation) or due to decreases 
(quiescence). We hypothesize that earthquakes with magnitudes larger 
than M c + 2 will occur preferentially in hotspots during the forecast time 
interval t 2 to t$ . Note that this is a binary forecast: either an earthquake 
is forecast to occur or it is forecast not to occur. 

3 Relative intensity 

An alternative approach to earthquake forecasting is to use the rate of 
occurrence of small earthquakes in the past. We refer to this type of 
forecast as a relative intensity (RI) forecast. In such a forecast, the study 
region is again tiled with boxes of size 0.1° X 0.1°. The number of 
earthquakes with magnitude M > 3.0 in each box is determined over 
the time period from to to t 2 . The RI score for each box is then com- 
puted as the total number of earthquakes in the box in the time period 
divided by the value for the box having the largest value. In order to cre- 
ate a binary forecast, a threshold value in the interval [0, 1] is selected. 
Large earthquakes having M > 5 are then considered possible only in 
boxes having an RI value larger than the threshold. The physical justi- 
fication for this type of forecast is that large earthquakes are considered 
most likely to occur at sites of high seismic activity. In this paper we 
combine our binary PI forecast with a continuum RI forecast in order to 
create our continuum RELM forecast. 



4 Binary versus continuum forecasts 

The earthquake forecast make by the PI method is a binary forecast. 
An earthquake is forecast to occur in the hotspot regions and not to 
occur in the other regions, analogous to the issuance of tornado warn- 
ings. An extensive methodology has been developed in the atmo- 
spheric sciences for forecast verification. A standard approach uses 
contingency tables and relative operating characteristic (ROC) diagrams 
ijolliffe and Stephenson, 2003). An example of binary forecast con- 
struction and verification is presented in the appendix. 

The alternative to binary forecasts is a continuum forecast. The like- 
lihood of an earthquake throughout the entire region is specified, anal- 
ogous to temperature forecasts in the atmospheric sciences. A common 
approach to testing the validity of these forecasts is the maximum like- 
lihood test. Kagan and Jackson 1 2000 ) were the first to apply this test 
to earthquake forecasts. The maximum likelihood test is not appropriate 
for the verification of binary forecasts because they are overly sensitive 
to the least probable events. For example, consider two forecasts. The 
first perfectly forecasts 99 out of 100 events but assigns zero probability 
to the last event. The second assigns zero probability to all 100 events. 
Under a likelihood test, both forecasts will have the same skill score of 
— oo. Furthermore, a naive forecast that assigns uniform probability to 
all possible sites will always score higher than a forecast that misses only 
a single event but is otherwise superior. 

5 Creating the forecast map 

The PI method finds regions where earthquakes are most likely to occur 
during a future time window. In order to create a forecast map suitable 
for RELM testing, we combined the PI map with the RI map to create a 
probability map. This map is then renormalized to unit probability and 
scaled by the total number of M > 5 earthquakes expected over the 
future five year period. The details of this procedure are as follows: 

1 . We first create a relative intensity map for the entire region to be 
considered. Data was taken from the ANSS on-line catalog for the 
years 1950 to 2005. This data was then truncated such that relative 
values greater than 10 _1 were set to 10 _1 and non-zero values 
less than 10~ 4 were set to 10~ 4 . Finally, since the RELM calcu- 
lations cannot handle zero-rate values, every box with zero historic 
seismicity was given a value of 10~ 5 . The RI map is shown in 
Figure UK. 

2. We next perform a pattern informatics calculation over the top 10% 
of most active sites in California using the ANSS catalog as input. 
For this calculation, we used to = 1 January 1950, ti = 1 January 
1985, and t 2 = 1 August 2005. Since the hotspots are where we ex- 
pect future earthquakes to occur, they are given a probability value 
of unity. The PI map is shown in Figure[Tfe. 

3. We then create a composite probability map by superimposing the 
PI map and its Moore neighborhood (the pixel plus its eight adja- 
cent neighbors) on top of the RI map. All the hotspot pixels have 
a probability of 1 , and all other pixels have probabilities that range 
from 10~ 5 to 10 _1 . The composite map is shown in Figure |2| 

4. To convert our pixel probabilities to earthquake occurrence proba- 
bilities, we first add up the probabilities in all pixels in the region 
and call this sum N. We then normalize this total to the expected 
number of M > 5.0 earthquakes during the forecast period. We 
estimate four to eight such events per year and assume 30 such 
events during a five year period. In order to do this, we multiply 
each pixel probability by 30 /N to give our RELM forecast. We 
then use Gutenberg-Richter scaling to interpolate these rates into 
the appropriate magnitude bins specified by the RELM test. 
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6 Discussion 

Ultimately there exists the fundamental question of whether forecasts of 
the time and location of future earthquakes can be accurately made. It 
is accepted that long term hazard maps of the expected rate of occur- 
rence of earthquakes are reasonably accurate. But is it possible to do 
better? Are there precursory phenomena that will allow earthquakes to 
be forecast? 

It is actually quite surprising that immediate local precursory phenom- 
ena are not seen. Prior to a volcanic eruption, increases in regional seis- 
micity and surface movements are generally observed. For a fault sys- 
tem, the stress gradually increases until it reaches the frictional strength 
of the fault and a rapture is initiated. It is certainly reasonable to hy- 
pothesize that the stress increase would cause increases in background 
seismicity and aseismic slip. In order to test this hypothesis the Park- 
field Earthquake Prediction Experiment was initiated in 1985. The ex- 
pected Parkfield earthquake occurred beneath the heavily instrumented 
region on 28 September 2004. No local precursory changes were ob- 
served lLindhll2005l) . 

In the absence of local precursory signals, the next question is whether 
broader anomalies develop, and in particular whether there is anomalous 
seismic activity. It is this question that is addressed in this paper. Using 
a technique that has been successfully applied to the forecasting of El 
Nino we have developed a systematic pattern informatics (PI) approach 
to the identification of regions of anomalous seismic activity. Applica- 
tions of this technique to California, Japan, and on a world-wide basis 
have successfully forecast the location of future earthquakes. We em- 
phasize that this is not an earthquake prediction. It is a forecast of where 
future earthquakes are expected to occur during a future time window of 



five to ten years. The objective is to reduce the possible future sites of 
earthquakes relative to a long term hazard assessment map. 
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Appendix A - Forecast verification 

Along with the RELM model, previous publish ed tests of ea rthquake 
forecasts have emphasized the likelihood test i Kaaan and Jackson, 2000; 
iRundle et alll2002HTiampo et alll2002cllHollidav et allteOOSft TAs dis- 
cussed above, these tests have the significant disadvantage that they are 
overly sensitive to the least probable events. For this reason, likelihood 
tests are subject to unconscious bias. 

An extensive review on forecast verification in the a tmospheric sci- 
ences has been given by Jolliffe and SteDhenson 1 2003 ). The wide va- 
riety of approaches that they consider are directly applicable to earth- 
quake forecasts as well. We believe that many of these approaches are 
better suited to the evaluation of earthquake forecasts. The earthquake 
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Figure 2: Composite forecast map. The scaled PI and RI maps have been combined, and boxes outside the testing region have been discarded. 



forecasts considered in this paper can be viewed as binary forecasts by 
considering the events (earthquakes) as being forecast either to occur or 
not to occur in a given box. We consider that there are four possible 
outcomes for each box, thus two ways to classify each hotspot, box, and 
two ways to classify each non-hotspot, box: 

1. An event occurs in a hotspot box or within the Moore neighbor- 
hood of the box (the Moore neighborhood is comprised of the eight 
boxes surrounding the forecast box). This is a success. 

2. No event occurs in a white non-hotspot box. This is also a success. 

3. No event occurs in a hotspot box or within the Moore neighborhood 
of the hotspot box. This is a false alarm. 

4. An event occurs in a white, non-hotspot box. This is a failure to 
forecast. 



We note that these rules tend to give credit, as successful forecasts, 
for events that occur very near hotspot boxes. We have adopted these 
rules in part because the grid of boxes is positioned arbitrarily on the 
seismically active region, thus we allow a margin of error of ±1 box 
dimension. In addition, the events we are forecasting are large enough 
so that their source dimension approaches, and can even exceed, the 
box dimension meaning that an event might have its epicenter outside a 
hotspot box, but the rupture might then propagate into the box. Other 
similar rules are possible but we have found that all such rules basically 
lead to similar results. 

The standard approach to the evaluation of a binary forecast is the 
use of a relative operating characteristic (ROC) diagram <Swetstll97l 
lMasorll2003T) . Standard ROC diagrams consider the fraction of failures- 
to-predict and the fraction of false alarms. This method evaluates the 
performance of the forecast method relative to random chance by con- 
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structing a plot of the fraction of failures to predict against the fraction 
of false alarms for an ensemble of forecasts. Molchan (Molchanl.119971) 
has used a modification of this method to evaluate the success of inter- 
mediate term earthquake forecasts. 

The binary approach h as a long histo ry, over 100 years, in the verifica- 
tion of tornado forecasts I Mason , 2003 ). These forecasts take the form of 
a tornado forecast for a specific location and time interval, each forecast 
having a binary set of possible outcomes. For example, during a given 
time window of several hours duration, a forecast is issued in which a 
list of counties is given with a statement that one or more tornadoes will 
or will not occur. A 2 X 2 contingency table is then constructed, the 
top row contains the counties in which tornadoes are forecast to occur 
and the bottom row contains counties in which tornadoes are forecast to 
not occur. Similarly, the left column represents counties in which torna- 
does were actually observed, and the right column represents counties in 
which no tornadoes were observed. 

With respect to earthquakes, our forecasts take exactly this form. A 
time window is proposed during which the forecast of large earthquakes 
having a magnitude above some minimum threshold is considered valid. 
An example might be a forecast of earthquakes larger than M = 5 
during a period of five or ten years duration. A map of the seismi- 
cally active region is then completely covered ("tiled") with boxes of 
two types: boxes in which the epicenters of at least one large earth- 
quake are forecast to occur and boxes in which large earthquakes are 
forecast to not occur. In other types of forecasts, large earthquakes are 
given some continuous probability of occurrence from 0% to 100% in 
each box iKagan and Jackson, 2000). These forecasts can be converted 
to the binary type by the application of a threshold value. Boxes having 
a probability below the threshold are assigned a forecast rating of non- 
occurrence during the time window, while boxes having a probability 
above the threshold are assigned a forecast rating of occurrence. A high 
threshold value may lead to many failures to predict (events that occur 
where no event is forecast), but few false alarms (an event is forecast at 
a location but no event occurs). The level at which the threshold is set 
is then a matter of public policy specified by emergency planners, repre- 
senting a balance between the prevalence of failures to predict and false 
alarms. 

Appendix B - Binary earthquake forecast verifi- 
cation 

To illustrate this approach to earthquake forecast verification, we have 
constructed two types of retrospective binary forecasts for California. 
The first type of forecas t utilizes the PI resu lts published by Rundle et 
al. and Tiampo et al. iRundle et all I2002L Ifiampo et all l2002d) for 
southern California and adjacent regions (32° to 38.3° N lat, 238° to 
245° E long). This forecast was constructed for the time period 1 Jan- 
uary 2000 to 3 1 December 2009, but we performed an interim analysis 
using data up to the present. The second type of forecast utilizes the RI 
results with the same parameter thresholds. 

The first step in our generation of ROC diagrams is the construction 
of the 2x2 contingency table for the PI and RI forecast maps. The 
hotspot boxes in each map represent the forecast locations. A hotspot 
box upon which at least one large future earthquake during the forecast 
period occurs is counted as a successful forecast. A hotspot box upon 
which no large future earthquake occurs during the forecast period is 
counted as an unsuccessfid forecast, or alternately, a false alarm. A 
white box upon which at least one large future earthquake during the 
forecast period occurs is counted as a failure to forecast. A white box 
upon which no large future earthquake occurs during the forecast period 
is counted as a unsuccessful forecast of non-occurrence. 

Verification of the PI and RI forecasts proceeds in exactly the same 
was as for tornado forecasts. For a given number of hotspot boxes, which 
is controlled by the value of the probability threshold in each map, the 



Table 1 : Contingency tables as a function of false alarm rate. In Ta- 
blefTK. a threshold value was chosen such that F 0.005. In TablelTB. 
a threshold value was chosen such that F ~ 0.021. 
(A) 



Pattern informatics (PI) forecast 



Forecast 


Observed 




Yes 


No 


Total 


Yes 


(a) 4 


(b) 25 


29 


No 


(c) 13 


(d) 4998 


5011 


Total 


17 


5023 


5040 


Relative intensity (RI) forecast 


Forecast 


Observed 




Yes 


No 


Total 


Yes 


(a) 2 


(b) 27 


29 


No 


(c) 14 


(d) 4997 


5011 


Total 


16 


5024 


5040 



(B) 

Pattern informatics (PI) forecast 



Forecast 


Observed 




Yes 


No 


Total 


Yes 


(a) 23 


(b) 104 


127 


No 


(c)9 


(d) 4904 


4913 


Total 


32 


5008 


5040 


Relative intensity (RI) forecast 


Forecast 


Observed 




Yes 


No 


Total 


Yes 


(a) 20 


(b) 107 


127 


No 


(c) 10 


(d) 4903 


4913 


Total 


30 


5010 


5040 



contingency table (see Table is constructed for both the PI and RI 
maps. Values for the table elements a (Forecast=yes, Observed=yes), 
b (Forecast=yes, Observed=no), c (Forecast=no, Observed=yes), and d 
(Forecast=no, Observed=no) are obtained for each map. The fraction of 
colored boxes, also called the probability of forecast of occurrence, is 
r = (a + b)/N, where the total number of boxes is TV = a + b + c + d. 
The hit rate is H = a/ (a + c) and is the fraction of large earthquakes 
that occur on a hotspot. The false alarm rate is _P = b/(b + d) and is 
the fraction of non-observed earthquakes that are incorrectly forecast. 

To analyze the information in the PI and RI maps, the standard pro- 
cedure is to consider all possible forecasts together. These are obtained 
by increasing F from (corresponding to no hotspots on the map) to 1 
(all active boxes on the map are identified as hotspots). The plot of H 
versus F is the relative operating characteristic (ROC) diagram. Vary- 
ing the threshold value for both the PI and RI forecasts, we have ob- 
tained the values of H and F given in Figure|5] The results correspond- 
ing to the contingency tables given in Table □ are given by the filled 
symbols. The forecast with 29 hotspot boxes has Fpj = 0.00498, 
H PI = 0.235 and F RI = 0.00537, H RI = 0.125. The fore- 
cast with 127 hotspot boxes has F PI = 0.0207, H P1 = 0.719 and 
Fri = 0.0213, Hri = 0.666. Also shown in Figurel3lis a gain curve 
defined by the ratio of Hpi(F) to Hjh(F). Gain values greater than 
unity indicate better performance using the PI map than using the RI 
map. The horizontal dashed line corresponds to zero gain. From Fig- 
ure|3]it can be seen that the PI approach outperforms (is above) the RI 
under many circumstances and both outperform a random map, where 
H = F, by a large margin. For reference, ROC diagrams using the 
modified method discussed in the main text for the same forecast period 
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0.005 0.01 0.015 0.02 0.025 0.03 

F 

Figure 3: Relative operating characteristic (ROC) diagram. Plot of hit 
rates, H, versus false alarm rates, F, for the PI forecast and RI forecast. 
Also shown is the gain ratio defined as Hpj(F) / Hrj(F). The filled 
symbols correspond to the threshold values used in Table fTI solid cir- 
cles for 29 hotspot boxes and solid squares for 127 hotspot boxes. The 
horizontal dashed line corresponds to zero gain. 

are given in Figure Note that a different input catalog was used for 
this analysis. Also note that in this case, the PI approach outperforms 
the RI under all circumstances. 
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